perm filename SPINDL.XGP[ESS,JMC] blob sn#214965 filedate 1976-05-09 generic text, type T, neo UTF8
/LMAR=0/XLINE=3/FONT#0=BAXL30/FONT#1=BAXI30/FONT#2=BASB30/FONT#3=SUB/FONT#4=SUP/FONT#5=BASL35/FONT#6=NGR25/FONT#7=XMAS25/FONT#8=FIX25/FONT#9=GACL25



␈↓ α∧␈↓α␈↓ ∧→SPINDL - A PROGRAM FOR FILE COMPRESSION

␈↓ α∧␈↓α␈↓ ¬Wby Robert E. Maas (REM)

␈↓ α∧␈↓␈↓ αTDisk␈αspace␈αin␈αthe␈α
Lab␈αis␈αin␈αshort␈α
supply␈αand␈αwill␈αremain␈α
that␈αway␈αfor␈αthe␈αforseeable␈α
future.
␈↓ α∧␈↓Therefore,␈αtwo␈αmethods␈αof␈αreducing␈αthe␈αamount␈αof␈αspace␈αtaken␈αby␈αfiles␈αare␈αhereby␈α
offered.␈α The
␈↓ α∧␈↓first␈α∂of␈α∂these␈α∂allows␈α⊂combining␈α∂several␈α∂files␈α∂into␈α∂one,␈α⊂and␈α∂the␈α∂second␈α∂uses␈α∂Huffman␈α⊂coding␈α∂to
␈↓ α∧␈↓further␈αreduce␈α
the␈αspace␈αtaken␈α
to␈αslightly␈α
less␈αthan␈αhalf␈α
(in␈αmost␈αcases)␈α
its␈αoriginal␈α
volume.␈α The
␈↓ α∧␈↓two facilities are offered by the same program called SPINDL.

␈↓ α∧␈↓␈↓ αTEach␈αdisk␈αfile␈αconsumes␈αat␈αleast␈αa␈αfull␈αdisk␈αblock␈α(2304␈αwords),␈αeven␈αif␈αit␈αcontains␈αonly␈αone
␈↓ α∧␈↓word␈αof␈αactual␈αdata.␈α Most␈αsuch␈αfiles␈αare␈αtext␈αfiles,␈αand␈αcan␈αnow␈αbe␈αcombined␈αinto␈αa␈α"spindle"␈αso
␈↓ α∧␈↓that␈αmuch␈αfewer␈αdisk␈αblocks␈αare␈αconsumed.␈α Later,␈αas␈αdesired,␈αindividual␈αfiles␈αin␈αthe␈αspindle␈αmay
␈↓ α∧␈↓be␈α∂retrieved.␈α∂ Since␈α∞the␈α∂system␈α∂DIrectory␈α∞command␈α∂measures␈α∂files␈α∞in␈α∂words␈α∂rather␈α∂than␈α∞blocks,
␈↓ α∧␈↓SPINDL will help the system even if the DIrectory command shows no reduction of word count.

␈↓ α∧␈↓␈↓ αTTo use this new facility, type the monitor command

␈↓ α∧␈↓␈↓ αTR SPINDL

␈↓ α∧␈↓The␈αprogram␈α
will␈αannounce␈α
itself␈αand␈α
ask␈αyou␈α
for␈αthe␈αname␈α
of␈αthe␈α
spindle␈αfile␈α
you␈αwant␈α
to␈αuse,
␈↓ α∧␈↓then␈αprompt␈αyou␈αwith␈αa␈α">"␈αat␈αthe␈αleft␈αmargin.␈α You␈αtype␈αa␈αfile␈αname,␈αdefault␈αextension␈αis␈α".SPI".
␈↓ α∧␈↓If␈α
the␈αfile␈α
doesn't␈αalready␈α
exist,␈α
it␈αwill␈α
be␈αcreated.␈α
 The␈α
program␈αwill␈α
now␈αannounce␈α
the␈αnumber␈α
of
␈↓ α∧␈↓files␈αcurrently␈αin␈αthe␈αspindle,␈αand␈αprint␈αa␈α"*"␈αat␈αthe␈αleft␈αmargin␈αto␈αindicate␈αthat␈αit␈αis␈αwaiting␈αfor␈αa
␈↓ α∧␈↓command.␈α If␈αyou␈αtype␈αa␈α"?"␈αfollowed␈αby␈αcarriage-return,␈αit␈αwill␈αprovide␈αa␈αlist␈αof␈αlegal␈αcommands.
␈↓ α∧␈↓The following commands will perform most of the useful functions you will want:

␈↓ α∧␈↓DIrectory -- tells you all files in the spindle.

␈↓ α∧␈↓Spindle -- copies a file from the disk into the spindle.

␈↓ α∧␈↓Unspindle -- copies a file from the spindle to the disk.

␈↓ α∧␈↓DElete -- deletes a file from the spindle (without reclaiming space).

␈↓ α∧␈↓Bubble -- copies the entire spindle, deleting all wasted space such as deleted files.

␈↓ α∧␈↓␈↓ αTAfter␈α∞some␈α∞of␈α∞the␈α∞commands␈α∂(for␈α∞example,␈α∞Spindle,␈α∞Unspindle,␈α∞and␈α∞DElete)␈α∂the␈α∞program
␈↓ α∧␈↓will␈α
ask␈α
you␈α
for␈α
various␈α
items,␈α
such␈α
as␈α
the␈αnames␈α
of␈α
the␈α
files␈α
to␈α
use,␈α
and␈α
the␈α
prompt␈α
for␈αtyping
␈↓ α∧␈↓each␈αsuch␈αitem␈αwill␈αbe␈αa␈α">"␈αat␈αthe␈αleft␈αmargin.␈α When␈αit␈αis␈αall␈αdone,␈αit␈αwill␈αprompt␈α"*"␈αat␈αthe␈αleft
␈↓ α∧␈↓margin indicating it is ready for another top-level command.

␈↓ α∧␈↓CRUNCH-AND-SPINDLE:

␈↓ α∧␈↓␈↓ αTGreater␈αdata␈αcompression␈αat␈α
the␈αcost␈αof␈αmore␈αcomputer␈α
time␈αis␈αobtained␈αwith␈α
the␈αSPINDL
␈↓ α∧␈↓command␈α∞"Crunch".␈α∞ (It␈α∞uses␈α∞Huffman␈α∞coding␈α∞to␈α∞reduce␈α∞the␈α∞volume␈α∞of␈α∞an␈α∞English␈α∞or␈α∞FAIL␈α
or
␈↓ α∧␈↓SAIL,␈α∞etc.␈α∂file␈α∞by␈α∂a␈α∞further␈α∞factor␈α∂of␈α∞two).␈α∂ SPINDL␈α∞will␈α∞ask␈α∂you␈α∞for␈α∂auxiliary␈α∞files␈α∂called␈α∞the


␈↓ α∧␈↓␈↓ ε|1␈↓ ∧



␈↓ α∧␈↓history␈α
tree␈αand␈α
the␈αHuffman␈α
tree,␈αbut␈α
carriage␈α
return␈αwill␈α
use␈αdefault␈α
trees␈α(if␈α
the␈αspindle␈α
already
␈↓ α∧␈↓contains␈αa␈αcrunched␈α
file,␈αdefault␈αis␈α
the␈αsame␈αtrees␈αas␈α
used␈αthe␈αprevious␈α
time,␈αotherwise␈αdefault␈αis␈α
a
␈↓ α∧␈↓particular␈α
pair␈α
of␈α
trees␈α
on␈α
[UP,REM]␈α
which␈αwork␈α
quite␈α
well␈α
with␈α
most␈α
English␈α
language␈αtext␈α
such
␈↓ α∧␈↓as␈αwriteups␈αand␈αessays).␈α Regardless␈αof␈αwhether␈αthe␈αdefault␈αtrees␈αor␈αsome␈αother␈αtrees␈αare␈αused,␈αone
␈↓ α∧␈↓copy␈α
of␈α
each␈α
tree␈α
actualy␈α
used␈αwill␈α
be␈α
copied␈α
into␈α
your␈α
spindle␈αfile␈α
(if␈α
it␈α
isn't␈α
there␈α
already),␈αso␈α
that
␈↓ α∧␈↓crunching␈α
only␈α
achieves␈α
a␈α
reduction␈αof␈α
total␈α
data␈α
if␈α
the␈αfiles␈α
being␈α
crunched␈α
with␈α
a␈αparticular␈α
pair
␈↓ α∧␈↓of␈α∞trees␈α∞total␈α∞at␈α∞least␈α∞1400-3000␈α∞words␈α∞(assuming␈α∞a␈α∞2-to-1␈α∞compression␈α∞and␈α∞assuming␈α∞700-1500
␈↓ α∧␈↓word␈α
trees).␈α
 Naturally,␈α
it␈α
is␈α
pointless␈α
to␈α
spindle␈α
a␈α
single␈α
file␈α
or␈α
to␈α
crunch␈α
a␈α
single␈α
file␈α
unless␈αits␈α
size
␈↓ α∧␈↓exceeds 2.3K.

␈↓ α∧␈↓␈↓ αTAppendix␈α
A␈α
explains␈α
how␈α
to␈α
make␈α
special␈α
trees,␈α
assuming␈α
none␈α
of␈α
the␈α
already-existing␈α
trees
␈↓ α∧␈↓listed␈αin␈α
Appendix␈αB␈α
are␈αadequate.␈α
 (Currently␈αonly␈α
trees␈αfor␈α
English␈αexist,␈α
but␈αsoon␈α
an␈αattempt
␈↓ α∧␈↓will be made to create trees for FAIL, SAIL, LISP, and perhaps LAP and SNOBOL.)

␈↓ α∧␈↓␈↓ αTTo␈α∞uncrunch-and-unspindle,␈α∞just␈α∂use␈α∞the␈α∞"Unspindle"␈α∞command␈α∂and␈α∞the␈α∞right␈α∂thing␈α∞will
␈↓ α∧␈↓happen␈α∂automatically,␈α∂the␈α∂correct␈α∂trees␈α∂will␈α∂be␈α∞selected␈α∂from␈α∂the␈α∂spindle␈α∂file␈α∂according␈α∂to␈α∞their
␈↓ α∧␈↓hash number and used for uncrunching your file.

␈↓ α∧␈↓␈↓ αTTiming:

␈↓ α∧␈↓␈↓ αTSpindle -- 3/4 second per thousand words.

␈↓ α∧␈↓␈↓ αTUnspindle -- 1/2 second per thousand words.

␈↓ α∧␈↓␈↓ αTCrunch-by-pages-and-spindle -- 4 sec to load default tree + 2 sec per K.

␈↓ α∧␈↓␈↓ αTUncrunch-by-pages-and-spindle -- 3.3 sec to load tree + 1 sec per K.

␈↓ α∧␈↓␈↓ αTWhen␈α
should␈αa␈α
file␈α
be␈αSPINDLed?␈α
 Our␈α
current␈αrecommendation␈α
is␈α
that␈αa␈α
file␈αnot␈α
expected
␈↓ α∧␈↓to␈α∞be␈α∞used␈α
in␈α∞30␈α∞days␈α∞should␈α
be␈α∞spindled,␈α∞and␈α∞if␈α
not␈α∞expected␈α∞to␈α∞be␈α
used␈α∞for␈α∞another␈α∞15␈α
days,
␈↓ α∧␈↓crunched␈αtoo.␈α These␈αrecommendations␈αare␈αconservative,␈αand␈αwill␈αchange␈αin␈αthe␈αdirection␈αof␈αmore
␈↓ α∧␈↓prompt spindling when the KL-10 is working.

␈↓ α∧␈↓␈↓ αTMany␈α∂people␈α∂will␈α∂be␈α∂able␈α∂to␈α∂double␈α∞the␈α∂effective␈α∂sizes␈α∂of␈α∂their␈α∂disk␈α∂allocations␈α∂by␈α∞using
␈↓ α∧␈↓these facilities.


␈↓ α∧␈↓α␈↓ ε"APPENDIX A

␈↓ α∧␈↓␈↓ β∧At␈αpresent␈αit␈αis␈αrather␈αa␈αhassle␈αto␈αcreate␈αnew␈αhistory␈αand␈αHuffman␈αtrees␈αfor␈αencoding␈αa
␈↓ α∧␈↓file,␈α∞however␈α
here␈α∞is␈α
an␈α∞explanation␈α
of␈α∞how␈α∞to␈α
do␈α∞it␈α
should␈α∞you␈α
really␈α∞want␈α
to.␈α∞ If␈α∞you␈α
already
␈↓ α∧␈↓have␈αa␈αlist␈αof␈αreversed␈αstrings␈αto␈αuse,␈αor␈αwant␈αto␈αuse␈αone␈αof␈αREM's,␈αskip␈αover␈αsteps␈α1-5,␈α
begin␈αat
␈↓ α∧␈↓step␈α6.␈α If␈αyou␈αalready␈αhave␈αthe␈αtwo␈αPolish␈αfiles␈αyou␈αwant␈αto␈αuse␈αfor␈αcrunching,␈αyou␈αmay␈αbegin␈αat
␈↓ α∧␈↓step 7.

␈↓ α∧␈↓␈↓ αTSTEP 1 -- Create a list of strings occurring most often.

␈↓ α∧␈↓␈↓ β∧RU CRU1[1,REM] -- Starts up the survey program.

␈↓ α∧␈↓␈↓ ε|2␈↓ ∧



␈↓ α∧␈↓␈↓ β∧B␈α--␈αSets␈αthe␈αprogram␈αto␈αBigram/Trigram␈αmode␈αin␈αwhich␈αit␈αscans␈αa␈αfile␈αfor␈αthe␈αstrings
␈↓ α∧␈↓which occur most often.

␈↓ α∧␈↓␈↓ β∧The program will ask you for input file name and write TOKENS.DAT.

␈↓ α∧␈↓␈↓ αTSTEP 2 -- Modify the list of strings using whatever editor and heuristics you choose...

␈↓ α∧␈↓␈↓ αTSTEP␈α3␈α--␈αReverse␈αthe␈αlist␈αof␈αstrings␈αso␈αthat␈αthey␈αread␈αbackwards␈α(from␈αa␈αgiven␈αpoint␈αin␈αa
␈↓ α∧␈↓file back to earlier characters).

␈↓ α∧␈↓␈↓ β∧RU CRU1[1,REM]

␈↓ α∧␈↓␈↓ β∧R -- Sets the program to Reverse mode.

␈↓ α∧␈↓␈↓ β∧TOKENS.DAT -- Tells it the name of the file to read.

␈↓ α∧␈↓␈↓ β∧The program will write SNEKOT.DAT.

␈↓ α∧␈↓␈↓ αTSTEP␈α
4␈α
--␈α
Separate␈α
the␈α
three␈α
parts␈α
of␈α
the␈α
file␈α
and␈α
sort␈α
them␈α
individually␈αinto␈α
lexicographic
␈↓ α∧␈↓sequence.

␈↓ α∧␈↓␈↓ β∧ET SNEKOT.DAT/N

␈↓ α∧␈↓␈↓ β∧Locate␈α
the␈αpoint␈α
where␈α
the␈αstrings␈α
change␈α
from␈αtwo-characters␈α
to␈α
one-character,␈αand␈α
the
␈↓ α∧␈↓point where they change from one-character to three-characters.  Put a page mark at each break.

␈↓ α∧␈↓␈↓ β∧E -- Exit the editor.

␈↓ α∧␈↓␈↓ β∧DO␈αSNE[1,REM]␈α--␈αSelects␈αthe␈αcorrect␈αpages,␈αcreating␈αSN.1␈αSN.2␈αand␈α
SN.3␈αcontaining
␈↓ α∧␈↓strings of a particular length, assuming your edit was correct.

␈↓ α∧␈↓␈↓ β∧R SSORT -- Starts up the String-Sorter program.

␈↓ α∧␈↓␈↓ β∧SN.2 -- Sort the 2-letter strings, when it asks you if it is ok to overwrite, say Y.

␈↓ α∧␈↓␈↓ β∧R SSORT

␈↓ α∧␈↓␈↓ β∧SN.3

␈↓ α∧␈↓␈↓ β∧COPY␈α
<FILE>/N/L←SN.1,SN.2,SN.3␈α--␈α
Whatever␈α
<FILE>␈αyou␈α
want,␈α
check␈αit␈α
to␈αbe␈α
sure
␈↓ α∧␈↓the strings are sorted by length, and in each group alphanumerically.

␈↓ α∧␈↓␈↓ αTSTEP 5 -- Modify the list of reversed strings using whatever editor as you choose...

␈↓ α∧␈↓␈↓ αT[If␈α∩you␈α⊃jumped␈α∩over␈α⊃steps␈α∩1-5,␈α∩there␈α⊃is␈α∩a␈α⊃spindle␈α∩called␈α∩SNEKOT.SPI[1,REM]␈α⊃which
␈↓ α∧␈↓contains various lists of reversed strings you may want to use.]



␈↓ α∧␈↓␈↓ ε|3␈↓ ∧



␈↓ α∧␈↓␈↓ αTSTEP␈α6␈α--␈αScan␈αyour␈αcollection␈αof␈αfiles␈αaccording␈αto␈αthe␈αlist␈αof␈αleft-contexts␈αdefined␈αby␈αthe
␈↓ α∧␈↓reversed strings you have created or found.

␈↓ α∧␈↓␈↓ β∧RU␈αHOTER[HOT,REM]␈αor␈αR␈αPTYJOB␈α--␈αwhichever␈αyour␈αfavorite␈αpty-job␈α
program,
␈↓ α∧␈↓have␈α
it␈αoutput␈α
a␈α
disk␈αfile␈α
so␈αyou␈α
can␈α
save␈αall␈α
the␈αTTY␈α
output␈α
containing␈αthe␈α
Huffman␈α
code,␈αso
␈↓ α∧␈↓that␈α∂you␈α∂can␈α⊂examine␈α∂it␈α∂sometime␈α∂later␈α⊂to␈α∂try␈α∂to␈α⊂figure␈α∂out␈α∂how␈α∂to␈α⊂improve␈α∂it␈α∂by␈α⊂deleting␈α∂or
␈↓ α∧␈↓inserting␈α↔left-contexts.␈α↔ If␈α⊗you␈α↔are␈α↔sure␈α↔you␈α⊗won't␈α↔want␈α↔to␈α⊗examine␈α↔the␈α↔code,␈α↔omit␈α⊗this
␈↓ α∧␈↓PTYJOB/HOTER part and just do the following directly on your terminal.

␈↓ α∧␈↓␈↓ β∧RU CRU1[1,REM]

␈↓ α∧␈↓␈↓ β∧S -- Sets the program to Scan-by-Reversed-Left-Context mode.

␈↓ α∧␈↓␈↓ β∧<FILE> -- The list-of-left-contexts file you laboriously created.

␈↓ α∧␈↓␈↓ β∧The␈αprogram␈αwill␈αask␈αyou␈αone␈αat␈αa␈αtime␈αfor␈αfiles␈αto␈αsurvey␈αaccording␈αto␈αthe␈αlist␈αof␈αleft-
␈↓ α∧␈↓contexts␈αthat␈αwere␈αread␈αin,␈αcompiling␈αa␈αcomplete␈α'200␈αword␈αtable␈αof␈αhow␈αmany␈αtimes␈αeach␈αASCII
␈↓ α∧␈↓character␈α
occurs␈α
in␈α
that␈α
left-context.␈α
 When␈α
you␈αare␈α
finished␈α
with␈α
all␈α
the␈α
files␈α
you␈α
want␈αto␈α
include,
␈↓ α∧␈↓type a blank line instead of a file name.

␈↓ α∧␈↓␈↓ β∧The␈αprogram␈αwill␈α
now␈αtype␈αout␈αall␈α
the␈αtotals,␈αwhich␈αyou␈α
want␈αto␈αsuppress␈α
by␈αcontrol-o
␈↓ α∧␈↓or escape-o depending on the type of terminal you are on.

␈↓ α∧␈↓␈↓ β∧After␈α
all␈α
that,␈α
output␈α
will␈α
be␈α
turned␈α
back␈α
on␈α
by␈α
the␈α
program,␈α
it␈α
will␈α
write␈α
out␈α
the␈α
file
␈↓ α∧␈↓HIST.POL␈αwhich␈α
is␈αa␈α
Polish␈αdescription␈αof␈α
all␈αthe␈α
left-contexts␈α(typically␈αthis␈α
file␈αis␈α
between␈α50
␈↓ α∧␈↓and 200 words).

␈↓ α∧␈↓␈↓ β∧After␈α∞that,␈α∞the␈α∞program␈α
will␈α∞begin␈α∞computing␈α∞an␈α
abbreviated␈α∞Huffman␈α∞code␈α∞for␈α
each
␈↓ α∧␈↓left-context,␈α
and␈α
type␈α
on␈α
the␈α
TTY␈α
or␈α
PTY␈α
the␈α
input␈α
bits,␈α
output␈α
bits,␈α
compression-ratio,␈α∞and␈α
a
␈↓ α∧␈↓complete␈α
description␈α
of␈α∞the␈α
Huffman␈α
code␈α∞in␈α
nice␈α
readable␈α
verbose␈α∞format.␈α
 If␈α
you␈α∞are␈α
running
␈↓ α∧␈↓this␈α
through␈α
HOTER␈α
you␈α
will␈α
probably␈α
want␈α∞to␈α
do␈α
a␈α
local␈α
↑O␈α
command␈α
to␈α
suppress␈α∞output␈α
to
␈↓ α∧␈↓TTY␈αwithout␈αaffecting␈αthe␈αcopying␈αof␈αPTY␈αoutput␈αto␈αthe␈αdisk␈αfile.␈α (PTYJOB␈αdoes␈αnot␈αprovide
␈↓ α∧␈↓this␈α⊂facility,␈α⊂only␈α⊂HOTER.)␈α⊂This␈α⊂process␈α⊂will␈α∂probably␈α⊂take␈α⊂about␈α⊂15␈α⊂minutes␈α⊂because␈α⊂of␈α∂the
␈↓ α∧␈↓volume␈α⊃of␈α⊂data␈α⊃being␈α⊂inefficiently␈α⊃copied,␈α⊂all␈α⊃the␈α⊂while␈α⊃the␈α⊂file␈α⊃HUFFS.POL␈α⊂will␈α⊃be␈α⊂written
␈↓ α∧␈↓containing each Huffman code generated (typically about 500-2000 words total).

␈↓ α∧␈↓␈↓ β∧You␈α
now␈α
have␈α
HIST.POL␈α
and␈α
HUFFS.POL␈α
containing␈α
a␈α
complete␈α
description␈α
of␈α
the
␈↓ α∧␈↓code to be used.

␈↓ α∧␈↓␈↓ αT[If␈α∪you␈α∪jumped␈α∪directly␈α∪here,␈α∪there␈α∀are␈α∪several␈α∪useful␈α∪history␈α∪and␈α∪Huffman␈α∀trees␈α∪on
␈↓ α∧␈↓[UP,REM] with extensions *.HIS and *.HUF respectively.]

␈↓ α∧␈↓␈↓ αTSTEP␈α
7␈α∞--␈α
When␈α∞using␈α
the␈α
Crunch-by-pages-and-spindle␈α∞command␈α
in␈α∞SPINDL,␈α
specify
␈↓ α∧␈↓the␈α
files␈αyou␈α
created␈α
in␈αstep␈α
6␈αor␈α
the␈α
files␈αyou␈α
found␈αalready␈α
existing␈α
*.HIS␈αand␈α
*.HUF␈α
--␈αonly
␈↓ α∧␈↓one copy of each different tree will be kept in the spindle that contains files crunched by it.




␈↓ α∧␈↓␈↓ ε|4␈↓ ∧



␈↓ α∧␈↓α␈↓ ε$APPENDIX B

␈↓ α∧␈↓␈↓ β∧All␈α∞existing␈α∂trees␈α∞are␈α∞in␈α∂the␈α∞disk␈α∂area␈α∞[UP,REM].␈α∞ History␈α∂trees␈α∞have␈α∂extension␈α∞.HIS
␈↓ α∧␈↓and␈αHuffman␈α
trees␈αhave␈αextension␈α
.HUF.␈α The␈αfollowing␈α
table␈αtells␈αwhich␈α
are␈αavailable␈αand␈α
what
␈↓ α∧␈↓files they are good for.
␈↓ α∧␈↓	        *.HIS   *.HUF   type of file it is for
␈↓ α∧␈↓	        NOTSEV  NOTIC1  English text (surveyed NOTICE[UP,DOC]) (DEFAULT TREES)
␈↓ α∧␈↓	        NOTSEV  SEVEN1  English text (same left-contexts but larger survey)









































␈↓ α∧␈↓␈↓ ε|5␈↓ ∧